Evaluation Techniques Applied to Domain Tuning of MT Lexicons

نویسندگان

  • Necip Fazıl Ayan
  • Bonnie J. Dorr
  • Okan Kolak
چکیده

We describe a set of evaluation techniques applied to domain tuning of bilingual lexicons for machine translation. Our overall objective is to translate a domain-specific document in a foreign language (in this case, Chinese) to English. First, we perform an intrinsic evaluation of the effectiveness of our domain-tuning techniques by comparing our domain-tuned lexicon to a manually constructed domain-specific bilingual termlist. Our results indicate that we achieve 66% recall and 95% precision with respect to a human-derived gold standard. Next, an extrinsic evaluation demonstrates that our domain-tuned lexicon improves the Bleu scores 50% over a statistical system—with a smaller improvement when the system is trained on a uniformly-weighted dictionary.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Domain Tuning of Bilingual Lexicons for MT

Our overall objective is to translate a domain-specific document in a foreign language (in this case, Chinese) to English. Using automatically induced domain-specific, comparable documents and language-independent clustering, we apply domain-tuning techniques to a bilingual lexicon for downstream translation of the input document to English. We will describe our domain-tuning technique and demo...

متن کامل

Two Principles and Six Techniques for Rapid Mt Development

In this paper we describe a range of techniques used at NMSU CRL for accelerating the development of MT systems. These techniques enable semi-automatic development of a number of components of a multilingual MT system, thereby enabling rapid deployment of MT capabilities in a new language. First, we describe the core multi-engine, multilingual architecture that enables the different techniques ...

متن کامل

Multilingual lexicons for related languages

The great increase in work on the lexicon by computational and theoretical linguists throughout the s has concerned itself almost exclusively with monolingual lexicons Meanwhile applied work on multilingual lexicons mostly for machine translation MT has employed monolingual lexicons linked only at the level of semantics In this paper we argue that the traditional MT lexicon architecture while a...

متن کامل

Construction-Based MT Lexicons

This paper presents a novel view of the boundary between the generalizable and the idiosyncratic in MT lexicons. We argue that the domain of the idiosyncratic should, in fact, be broader than in most current approaches. While at present most MT systems involve phrasal lexicons, these typically contain terminology from a particular field. In order to facilitate naturalness of translation, specif...

متن کامل

MTriage: Web-enabled Software for the Creation, Machine Translation, and Annotation of Smart Documents

Progress in the Machine Translation (MT) research community, particularly for statistical approaches, is intensely data-driven. Acquiring source language documents for testing, creating training datasets for customized MT lexicons, and building parallel corpora for MT evaluation require translators and non-native speaking analysts to handle large document collections. These collections are furt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003